An XML Representation for Annotated Handwriting Datasets for Online Handwriting Recognition

نویسندگان

  • Ajay S. Bhaskarabhatla
  • Sriganesh Madhvanath
چکیده

In this paper, we briefly descibe an XML representation for annotation of online handwriting data to support the development and evaluation of handwriting recognition algorithms, that is based on the emerging Digital Ink Markup Language (InkML) draft standard from W3C. In particular, we describe how the XML representation we have defined attempts to address issues of (i) support for different scripts, (ii) partial automation of labeling using recognition engines, (iii) planned as well as casual capture of handwriting data and (iv) semantic annotation of handwriting data at various levels such as character, word and phrase. The representation keeps the raw handwriting data (described by InkML) separate from its semantic interpretations. We also compare and contrast the XML representation with the extant UNIPEN representation for annotation of handwriting data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

XPEN: An XML Based Format for Distributed Online Handwriting Recognition

Architectures for integrated and distributed handwriting recognition systems are discussed. An XML (eXtensible Markup Language) based representation for online handwriting data, referred to as XPEN, is proposed. XPEN is based on the earlier UNIPEN format. The flexibility of the new format is illustrated with an example of the use of XSLT (XSL-Transformations) to translate XPEN into the Scalable...

متن کامل

Persian Handwriting Analysis Using Functional Principal Components

Principal components analysis is a well-known statistical method in dealing with large dependent data sets. It is also used in functional data for both purposes of data reduction as well as variation representation. On the other hand "handwriting" is one of the objects, studied in various statistical fields like pattern recognition and shape analysis. Considering time as the argument,...

متن کامل

Experiences in Collection of Handwriting Data for Online Handwriting Recognition in Indic Scripts

In this paper, we describe initial efforts at Hewlett-Packard Labs, Bangalore, to create datasets of online handwriting in Indic scripts to support research in online handwriting recognition for the Indic scripts. The term “online” here refers to the fact that handwriting is captured as a stream of (x,y) points using an appropriate pen position sensor (often called a digitizer), rather than as ...

متن کامل

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

Isolated Persian/Arabic handwriting characters: Derivative projection profile features, implemented on GPUs

For many years, researchers have studied high accuracy methods for recognizing the handwriting and achieved many significant improvements. However, an issue that has rarely been studied is the speed of these methods. Considering the computer hardware limitations, it is necessary for these methods to run in high speed. One of the methods to increase the processing speed is to use the computer pa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004